29 research outputs found
Catalyst Acceleration for Gradient-Based Non-Convex Optimization
We introduce a generic scheme to solve nonconvex optimization problems using
gradient-based algorithms originally designed for minimizing convex functions.
Even though these methods may originally require convexity to operate, the
proposed approach allows one to use them on weakly convex objectives, which
covers a large class of non-convex functions typically appearing in machine
learning and signal processing. In general, the scheme is guaranteed to produce
a stationary point with a worst-case efficiency typical of first-order methods,
and when the objective turns out to be convex, it automatically accelerates in
the sense of Nesterov and achieves near-optimal convergence rate in function
values. These properties are achieved without assuming any knowledge about the
convexity of the objective, by automatically adapting to the unknown weak
convexity constant. We conclude the paper by showing promising experimental
results obtained by applying our approach to incremental algorithms such as
SVRG and SAGA for sparse matrix factorization and for learning neural networks
Hitting the High-Dimensional Notes: An ODE for SGD learning dynamics on GLMs and multi-index models
We analyze the dynamics of streaming stochastic gradient descent (SGD) in the
high-dimensional limit when applied to generalized linear models and
multi-index models (e.g. logistic regression, phase retrieval) with general
data-covariance. In particular, we demonstrate a deterministic equivalent of
SGD in the form of a system of ordinary differential equations that describes a
wide class of statistics, such as the risk and other measures of
sub-optimality. This equivalence holds with overwhelming probability when the
model parameter count grows proportionally to the number of data. This
framework allows us to obtain learning rate thresholds for stability of SGD as
well as convergence guarantees. In addition to the deterministic equivalent, we
introduce an SDE with a simplified diffusion coefficient (homogenized SGD)
which allows us to analyze the dynamics of general statistics of SGD iterates.
Finally, we illustrate this theory on some standard examples and show numerical
simulations which give an excellent match to the theory.Comment: Preliminary versio